Search CORE

66 research outputs found

Contextual Bandits with Cross-learning

Author: Balseiro Santiago
Golrezaei Negin
Mahdian Mohammad
Mirrokni Vahab
Schneider Jon
Publication venue
Publication date: 03/01/2020
Field of study

In the classical contextual bandits problem, in each round

t

, a learner observes some context

c

, chooses some action

a

to perform, and receives some reward

r_{a,t}(c)

. We consider the variant of this problem where in addition to receiving the reward

r_{a,t}(c)

, the learner also learns the values of

r_{a,t}(c')

for all other contexts

c'

; i.e., the rewards that would have been achieved by performing that action under different contexts. This variant arises in several strategic settings, such as learning how to bid in non-truthful repeated auctions (in this setting the context is the decision maker's private valuation for each auction). We call this problem the contextual bandits problem with cross-learning. The best algorithms for the classical contextual bandits problem achieve

\tilde{O}(\sqrt{CKT})

regret against all stationary policies, where

C

is the number of contexts,

K

the number of actions, and

T

the number of rounds. We demonstrate algorithms for the contextual bandits problem with cross-learning that remove the dependence on

C

and achieve regret

O(\sqrt{KT})

(when contexts are stochastic with known distribution),

\tilde{O}(K^{1/3}T^{2/3})

(when contexts are stochastic with unknown distribution), and

\tilde{O}(\sqrt{KT})

(when contexts are adversarial but rewards are stochastic).Comment: 48 pages, 5 figure

arXiv.org e-Print Archive

DSpace@MIT

Recommended from our members

Competition and Yield Optimization in Ad Exchanges

Author: Balseiro Santiago
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2013
Field of study

Ad Exchanges are emerging Internet markets where advertisers may purchase display ad placements, in real-time and based on specific viewer information, directly from publishers via a simple auction mechanism. The presence of such channels presents a host of new strategic and tactical questions for publishers. How should the supply of impressions be divided between bilateral contracts and exchanges? How should auctions be designed to maximize profits? What is the role of user information and to what extent should it be disclosed? In this thesis, we develop a novel framework to address some of these questions. We first study how publishers should allocate their inventory in the presence of these new markets when traditional reservation-based ad contracts are available. We then study the competitive landscape that arises in Ad Exchanges and the implications for publishers' decisions. Traditionally, an advertiser would buy display ad placements by negotiating deals directly with a publisher, and signing an agreement, called a guaranteed contract. These deals usually take the form of a specific number of ad impressions reserved over a particular time horizon. In light of the growing market of Ad Exchanges, publishers face new challenges in choosing between the allocation of contract-based reservation ads and spot market ads. In this setting, the publisher should take into account the tradeoff between short-term revenue from an Ad Exchange and the long-term impact of assigning high quality impressions to the reservations (typically measured by the click-through rate). In the first part of this thesis, we formalize this combined optimization problem as a stochastic control problem and derive an efficient policy for online ad allocation in settings with general joint distribution over placement quality and exchange bids, where the exchange bids are assumed to be exogenous and independent of the decisions of the publishers. We prove asymptotic optimality of this policy in terms of any arbitrary trade-off between quality of delivered reservation ads and revenue from the exchange, and provide a bound for its convergence rate to the optimal policy. We also give experimental results on data derived from real publisher inventory, showing that our policy can achieve any Pareto-optimal point on the quality vs. revenue curve. In the second part of this thesis, we relax the assumption of exogenous bids in the Ad Exchange and study in more detail the competitive landscape that arises in Ad Exchanges and the implications for publishers' decisions. Typically, advertisers join these markets with a pre-specified budget and participate in multiple second-price auctions over the length of a campaign. We introduce the novel notion of a Fluid Mean Field Equilibrium (FMFE) to study the dynamic bidding strategies of budget-constrained advertisers in these repeated auctions. This concept is based on a mean field approximation to relax the advertisers' informational requirements, together with a fluid approximation to handle the complex dynamics of the advertisers' control problems. Notably, we are able to derive a closed-form characterization of FMFE, which we use to study the auction design problem from the publisher's perspective focusing on three design decisions: (1) the reserve price; (2) the supply of impressions to the Exchange versus an alternative channel such as bilateral contracts; and (3) the disclosure of viewers' information. Our results provide novel insights with regard to key auction design decisions that publishers face in these markets. In the third part of this thesis, we justify the use of the FMFE as an equilibrium concept in this setting by proving that the FMFE provides a good approximation to the rational behavior of agents in large markets. To do so, we consider a sequence of scaled systems with increasing market size;. In this regime we show that, when all advertisers implement the FMFE strategy, the relative profit obtained from any unilateral deviation that keeps track of all available information in the market becomes negligible as the scale of the market increases. Hence, a FMFE strategy indeed becomes a best response in large markets

Columbia University Academic Commons

Contextual Standard Auctions with Budgets: Revenue Equivalence and Efficiency Guarantees

Author: Balseiro Santiago
Kroer Christian
Kumar Rachitesh
Publication venue
Publication date: 11/05/2022
Field of study

The internet advertising market is a multi-billion dollar industry, in which advertisers buy thousands of ad placements every day by repeatedly participating in auctions. In recent years, the industry has shifted to first-price auctions as the preferred paradigm for selling advertising slots. Another important and ubiquitous feature of these auctions is the presence of campaign budgets, which specify the maximum amount the advertisers are willing to pay over a specified time period. In this paper, we present a new model to study the equilibrium bidding strategies in standard auctions, a large class of auctions that includes first- and second-price auctions, for advertisers who satisfy budget constraints on average. Our model dispenses with the common, yet unrealistic assumption that advertisers' values are independent and instead assumes a contextual model in which advertisers determine their values using a common feature vector. We show the existence of a natural value-pacing-based Bayes-Nash equilibrium under very mild assumptions. Furthermore, we prove a revenue equivalence showing that all standard auctions yield the same revenue even in the presence of budget constraints. Leveraging this equivalence, we prove Price of Anarchy bounds for liquid welfare and structural properties of pacing-based equilibria that hold for all standard auctions. Our work takes an important step toward understanding the implications of the shift to first-price auctions in internet advertising markets

arXiv.org e-Print Archive

Single-Leg Revenue Management with Advice

Author: Balseiro Santiago
Kroer Christian
Kumar Rachitesh
Publication venue
Publication date: 09/10/2022
Field of study

Single-leg revenue management is a foundational problem of revenue management that has been particularly impactful in the airline and hotel industry: Given

n

units of a resource, e.g. flight seats, and a stream of sequentially-arriving customers segmented by fares, what is the optimal online policy for allocating the resource. Previous work focused on designing algorithms when forecasts are available, which are not robust to inaccuracies in the forecast, or online algorithms with worst-case performance guarantees, which can be too conservative in practice. In this work, we look at the single-leg revenue management problem through the lens of the algorithms-with-advice framework, which attempts to harness the increasing prediction accuracy of machine learning methods by optimally incorporating advice about the future into online algorithms. In particular, we characterize the Pareto frontier that captures the tradeoff between consistency (performance when advice is accurate) and competitiveness (performance when advice is inaccurate) for every advice. Moreover, we provide an online algorithm that always achieves performance on this Pareto frontier. We also study the class of protection level policies, which is the most widely-deployed technique for single-leg revenue management: we provide an algorithm to incorporate advice into protection levels that optimally trades off consistency and competitiveness. Moreover, we empirically evaluate the performance of these algorithms on synthetic data. We find that our algorithm for protection level policies performs remarkably well on most instances, even if it is not guaranteed to be on the Pareto frontier in theory. Our results extend to other unit-cost online allocations problems such as the display advertising and the multiple secretary problem

arXiv.org e-Print Archive

Online Resource Allocation under Horizon Uncertainty

Author: Balseiro Santiago
Kroer Christian
Kumar Rachitesh
Publication venue
Publication date: 02/11/2022
Field of study

We study stochastic online resource allocation: a decision maker needs to allocate limited resources to stochastically-generated sequentially-arriving requests in order to maximize reward. At each time step, requests are drawn independently from a distribution that is unknown to the decision maker. Online resource allocation and its special cases have been studied extensively in the past, but prior results crucially and universally rely on the strong assumption that the total number of requests (the horizon) is known to the decision maker in advance. In many applications, such as revenue management and online advertising, the number of requests can vary widely because of fluctuations in demand or user traffic intensity. In this work, we develop online algorithms that are robust to horizon uncertainty. In sharp contrast to the known-horizon setting, no algorithm can achieve even a constant asymptotic competitive ratio that is independent of the horizon uncertainty. We introduce a novel generalization of dual mirror descent which allows the decision maker to specify a schedule of time-varying target consumption rates, and prove corresponding performance guarantees. We go on to give a fast algorithm for computing a schedule of target consumption rates that leads to near-optimal performance in the unknown-horizon setting. In particular, our competitive ratio attains the optimal rate of growth (up to logarithmic factors) as the horizon uncertainty grows large. Finally, we also provide a way to incorporate machine-learned predictions about the horizon which interpolates between the known and unknown horizon settings

arXiv.org e-Print Archive

La modulación del ciclo exo-endocitótico granular como mecanismo de plasticidad inmunológica en el mastocito

Author: Balseiro Gómez Santiago
Publication venue
Publication date: 24/03/2017
Field of study

idUS. Depósito de Investigación Universidad de Sevilla